Open source multi-platform NooJ for NLP
نویسندگان
چکیده
The purpose of this demo is to introduce the linguistic development tool NooJ. The tool has been in development for a number of years and it has a solid community of computational linguists developing grammars in two dozen languages ranging from Arabic to Vietnamese1. Despite its manifest capabilities and reputation, its appeal within the wider HLT community was limited by the fact that it was confined to the .NET framework and it was not open source. However, under the auspices of the CESAR project it has recently been turned open source and a JAVA and a MONO version have been produced. In our view this significant development justifies a concise but thorough description of the system, demonstrating its potential for deployment in a wide variety of settings and purposes. The paper describes the history, the architecture, main functionalities and potential of the system for teaching, research and application development.
منابع مشابه
A dictionary and a grammar of French compounds (Un dictionnaire et une grammaire de composés français) [in French]
The paper introduces two resources for NLP, available with a GPL license: a dictionary of French compound words and a NooJ grammar which specifies a subset of compound patterns. Mots-clés : open source, ressources, dictionnaire, grammaire, mots composés
متن کاملParaphrasing of Italian Support Verb Constructions based on Lexical and Grammatical Resources
Support verb constructions (SVC), are verb-noun complexes which play a role in many natural language processing (NLP) tasks, such as Machine Translation (MT). They can be paraphrased with a full verb, preserving its meaning, improving at the same time the MT raw output. In this paper, we discuss the creation of linguistic resources namely a set of dictionaries and rules that can identify and pa...
متن کاملStandard Arabic formalization and linguistic platform for its analysis
From the beginning of the sixties, and starting with the first automatic analyzer proposed by David Cohen, one of the first theorists of NLP [1], research has continued with natural language processing and especially the automatic treatment of the Arabic language. In 1983, with a minimalist morphological analysis, based on the theory that any Arabic form is generated using root and pattern, res...
متن کاملAutomatic transcription of 17th century English text in Contemporary English with NooJ: Method and Evaluation
Since 2006 we have undertaken to describe the differences between 17th century English and contemporary English thanks to NLP software. Studying a corpus spanning the whole century (tales of English travellers in the Ottoman Empire in the 17th century, Mary Astell's essay A Serious Proposal to the Ladies and other literary texts) has enabled us to highlight various lexical, morphological or gra...
متن کاملA Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents
The recent growth of the World Wide Web at increasing rate and speed and the number of online available resources populating Internet represent a massive source of knowledge for various research and business interests. Such knowledge is, for the most part, embedded in the textual content of web pages and documents, which is largely represented as unstructured natural language formats. In order ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012